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We address the problem of estimating unknown model parameters and state variables in stochastic 
reaction processes when only sparse and noisy measurements are available. Using an asymptotic 
system size expansion for the backward equation, we derive an efficient approximation for this 
problem. We demonstrate the validity of our approach on model systems and generalize our method 
to the case when some state variables are not observed. 
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Stochastic reaction processes are models for the dy- 
namics of a population of interacting species, e.g., chem- 
ical substances which can participate in several reactions. 
They play an important role in physics, chemistry, and 
biology P, Q- Given the reaction rates of the process, 
the probability of having a specific number of individuals 
(molecules) in the system at a certain time is computed 
by solving the chemical Master equation for which effi- 
cient exact and approximate approaches are known. 

In recent years, this modelling framework has found 
increasing interest in the field of systems biology, where 
it has been applied to gene regulation and protein net- 
works [3|. The fact that here reactions are known only 
qualitatively leaving a variety of numerical rate parame- 
ters unknown leads to a different type of computational 
problem. It becomes necessary to estimate these param- 
eters using discrete noisy observations of the stochastic 
process. The information contained in the observations 
also changes the time evolution of the system's proba- 
bility distribution. This problem of state and parameter 
estimation is known as data assimilation Q. 

When observations are frequent and the process is in 
equilibrium, one may estimate parameters directly by fit- 
ting the stationary distribution or correlation functions 
of the process to their counterparts measured from the 
data. For an application of this method to models of fi- 
nancial markets, see [Hj. But the problem of statistical in- 
ference becomes especially nontrivial when observations 
are sparse. In this case, one would like to apply sta- 
tistically efficient methods such as maximum-likelihood 
(ML) estimators or Bayes estimators. Unfortunately, 
exact computations of such estimators or their simula- 
tion by stochastic techniques such as particle methods or 
Monte Carlo approaches may become time consuming or 
intractable [6|-|8|. To cope with this problem, we present 
in this Letter an efficient approximate solution based on 
an asymptotic system size expansion [Ij. 

We begin by formulating a stochastic reaction pro- 
cess for d interacting species, which participate in sev- 
eral chemical reactions. The statistical dynamics for the 
state X = (xi, . . . , Xd) where xj denotes the numbers of 
individuals (molecules) of species j, is governed by a con- 
tinuous time Markov jump process. The rate of each re- 



action i is given by its rate function /ii(x). Waiting times 
between two consecutive reaction events are exponen- 
tially distributed with rate parameter A(x) = /ii(x). 
At each time such an event occurs, a specific reaction 
i is chosen with probability /ii(x)/A(x) and the state x 
changes deterministically into some other state x' which 
depends on the reaction. 

The probability pt (x) of finding the system in state x 
at any time t evolves through the Master equation 



d 



[pt(x')/9(x|x')-pt(x)/,(x'|x)] , (I) 



where the total rate fg (x'|x) is the sum of the rates hi{x.) 
for all processes which lead from x to x'. 9 denotes a 
vector of parameters which determine the explicit form of 
the hi. This might be, e.g., the rate constants appearing 
in the mass action kinetics of chemical reactions [9|. 

We will now address the problem of estimating the 
parameters 9 from a set of measurements D = {yi}fLi- 
The Yi are noisy observations of the true process x{ti) at 
discrete times ti. A related problem is the computation 
of pt(x|Z)), the probability of the state at time t based 
on past and future observations. 

Statistically efficient methods for parameter estima- 
tion can be based on the likelihood of the observations 
p{D\9). A maximum likelihood (ML) approach would 
maximize p{D\9) with respect to 6, whereas a Bayesian 
procedure would compute a posterior density of param- 
eters p{9\D) oc p{D\9) p{9) when prior knowledge in the 
form of a density p{6) is available. 

It is straightforward to derive a recursion for p{D\9). 
For any time t, we define rf(x) = p{D>t\9,Xt = x), the 
likelihood of future observations D>t = {yi}ti>t condi- 
tioned on the present state x(t) = x. The likelihood of all 
data is then p{D\9) = X]x^'o(x)ro(x), where Po(x) is the 
distribution of the initial state. For times between two 
observations, rt obeys the Kolmogorov backward equa- 
tion 

Vt(x)=E/(^'W[^*W-^*(^')]- (2) 



dt 



Here and in the following, we omit the parameters 9 in 
the rate / for notational clarity. The observations enter 
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through their conditional distributions (assuming inde- 
pendent noise) p(y|x), in the end condition rtj^(x) = 
p(yAr|x(ijv)) and in the jump conditions 



hm r(x, t) 



(3) 



where tj' and denote the left and right side limits. A 
numerical inference method would process a sequence of 
parameter values 9 together with their likelihoods p{D\9) 
either for optimization or for drawing samples from the 
posterior. For each of these values, a full solution of 
Eq. ^ backwards in time is required. 

Given the exact parameters or good estimates, the 
conditional probability pt(x|D) is obtained using the 
Markov nature of x(t) and Bayes' rule as pt{x.\D) oc 
Pt(x|I?<t) rt(x), where pt{yi\D^t) is the conditional dis- 
tribution of the state based only on the observations 
D^t = {yi}ti<t before time t. For times between obser- 
vations, this probability fulfils the forward Eq. ([1]) with 
jump conditions at the observations. This result can be 
further simplified by noting that the conditional process 
is also Markovian. Hence, pt{x\D) itself fulfils a Master 
equation, where the original rates /(x'|x) are replaced 
by time dependent rates (7t(x'|x) which take the obser- 
vations into account. By differentiating pt{'x.\D) with 
respect to time t, using the backward Eq. ^ and the 
forward Eq. ([T]) , the rates of the conditional process are 
found to satisfy 



g,(x'|x) ^ r,(xO 
/(x'|x) n(x) 



(4) 



Note that for times t > after the last observation, 
5t(x'|x) = /(x'|x). Hence, the statistical inference prob- 
lem requires the repeated solution of a matrix linear dif- 
ferential equation with dimensionality of the order M'^, 
where M is the typical number of states accessible to 
each species in the system. 

We next present an approximate solution to this in- 
ference problem which follows van Kampen's idea of an 
asymptotic system size expansion This was devel- 
oped to approximate the solution of the Master equation 
([T]) in the limit when the typical number of individuals 
(molecules) is a macroscopic quantity and fluctuations 
are expected to be small. To lowest order, a classical de- 
terministic rate equation for the process is obtained. The 
next order includes fluctuations which obey a Gaussian 
diffusion process of the Ornstein-Uhlenbeck type. The 
method is asymptotically valid under the conditions of a 
stable classical equilibrium. Formally, the two lowest or- 
der terms in the expansion can be derived by approxi- 
mating the jump process flrst with a diffusion process us- 
ing a Kramers-Moyal expansion followed by a subsequent 
weak noise limit, where the diffusion term is treated as 
a small quantity. In this Letter, we generalize this idea 
for solving the backward Eq. ^ instead. The flrst step 



is a backward Kramers-Moyal expansion jl0| in ([2]) to 
obtain a backward Fokker-Planck equation (BFPE) . The 
corresponding diffusion process has a drift vector f and 
diffusion matrix D defined by 



f(x) = 5]/(x'|x)(x'-x), 



(5) 



D(x) = ^(x'-x)/(x'|x)(x'-x)^. (6) 

We then apply a weak noise expansion by formally rescal- 
ing D — !■ e^D and assuming that typical state vectors are 
close to a nonrandom time dependent state h(t). Setting 
X = b(i) -I- eu and rt(x) = V't(u) we expand the BFPE 
up to order e^. The macroscopic state equation 

b = f (b) (7) 

is obtained from terms of order e whereas from the second 
order, we obtain the partial differential equation 

dti^t + [A(t)u]^ VV;t -t- ^Tr [D(b)VV^] t/;* = (8) 

where V is the vector of derivatives with respect to u 
and Aij{t) — dxjfi{x)\^^^^i^^y If we assume that the 
measurement noise can be modelled by a Gaussian, i.e., 
p(y|x) oc exp[— ||y — x|p/(2(T^)], a simple solution to ^ 
which is compatible with the jump conditions ([3]), yields 



rt(x) 



zit) 



exp 



h) ' s- 



b) 



(9) 



where the matrix S satisfies the differential equation 

S = AS + SA^-D(b) (10) 

and the normalization (r is not a probability) obeys 
z{t) = z{t)Tr[A{t)]. Note, that jump conditions lead 
to discontinuities for both b and S at the times of the 
observations. Hence, to compute our approximate solu- 
tion rt of the backward equation, we only have to solve 
the ordinary differential equations for b and S backward 
in time starting at the last observation b(tjv) — yjv and 
^(iw) = f^I; where I is the unit matrix. In contrast to 
the exact backward equation, the dimensionality of these 
equations is only of the order (P . In this approximation, 
the likelihood piD\9) is given by piD\9) « {2Tr)'^/^z{0) 
where we have assumed an uninformative flat distribu- 
tion for po(x). 

Strictly speaking, the weak noise expansion for the 
backward equation is only valid when the observations 
Yi are sufficiently close to the classical path b. Other- 
wise, the assumption of Gaussian fiuctuations implicit 
in our approximation would not be correct, and large 
deviation techniques would be required. However, our 
likelihood based approach of parameter estimation aims 
at making observations highly probable with respect to 
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TABLE I: Maximum likelihood estimates of reaction con- 
stants based on 11 observations with a = 1. Mean values 
and standard deviations have been obtained by averaging over 
1000 samples from the Lotka-Volterra process. 



parameter 


inference result 


uncertainty 


true value 


a 


(4.0 ± 1.2) X 10"^ 


(0.9 ±0.3) X 10"^ 


4 X 10"^ 


P 


(2.1 ± 0.9) X 10"'' 


(0.5 ±0.4) X 10"'' 


2 X 10"* 


7 


(4.5 ±4.7) X 10"^ 


(1.2 ±0.7) X 10~^ 


4 X 10"^ 


5 


(1.9 ±0.6) X lO""* 


(0.5 ±0.2) X 10"'' 


2 X 10"* 



model parameters. This will drive typical paths suffi- 
ciently close to the observed data to justify the approxi- 
mation for likely parameter values. 

Although ([S]) can be viewed as the backward equa- 
tion for a linear stochastic differential equation, our lin- 
earization differs from that used in the well known ex- 



tended Kalman smoother [ll| approach of data assimila- 
tion which has a similar computational complexity. This 
is based on a linearization applied to the forward filtering 
process pt(x|£'<t) which requires the knowledge of ini- 
tial conditions. Our backward approach can deal more 
robustly with vague initial conditions po (x) . 

The Kalman approach would simply approximate the 
conditional probability pf(x|£)) by that of the linearized 
model estimated in the filtering step. In contrast, we ap- 
ply the combination of Kramers-Moyal and weak noise 
expansions a second time to a (forward) process with 
jump rate (x'|x) given in (j4|) leading to a new lineariza- 
tion at a different state. The drift of the resulting diffu- 
sion process is obtained by expanding ^((x') to first order 
around x in Eq. ^ . Using ^ and the definition of the 
diffusion matrix ([6]), we obtain 



g(x, t) « f (x) - D[b(t)]S"i(t)[x - h{t)] 



(11) 



The subsequent weak noise expansion leads to a Gaussian 
approximation for pt{'x\D) where the mean state vector 
m and and the covariance matrix C evolve according to 



g(m) 



HC + CH 



-D(m) 



(12) 



with Hij[t) — f^2;j5i(^)|x-m(t)' The discontinuities in the 
drift g(x, at the observation times lead to discontinu- 
ities in the first derivative of m and C. When no explicit 
prior knowledge of the initial state is known, we start the 
forward equation with the most likely value m(0) = b(0) 
and with the uncertainty given by C(0) = S(0). 

We have successfully applied our method to observa- 
tions generated by simulating two rather simple, but non- 
trivial reaction systems. The first one, the well-known 
Lotka-Volterra model [1, , consists of two interacting 
species, traditionally named preys and predators. In this 
system four reactions are possible, as both preys Xi and 
predators X2 can be created or destroyed at any point in 
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FIG. 1: Inference results for a Lotka-Volterra process with pa- 
rameters given in Tabled Symbols denote noisy observations, 
thick lines show the posterior mean, and the 95% confidence 
region for parameters fixed at their maximum-likelihood val- 
ues is surrounded by dotted lines. For comparison, the orig- 
inal process is plotted as thin line. Parameter estimates for 
this data set: a = (6.1 ± 1.3) x 10"^ (3 = (2.6 ± 0.5) x 10"*, 
7 = (4.1 ± 0.9) X 10"^, 5 = (1.8 ± 0.4) x 10"*. 



time. The reactions and rate laws are given by 



X\ — 


^ 2X1 




— axi , 


Xi - 


-> 


/i2(x) 


= (3xiX2 


X2 - 


-> 2X2 




= 8x1X2 


X2 - 


-> 


^4(x) 


= 1x2, 



(13) 



where xi denotes the number of preys and X2 the num- 
ber of predators. The second reaction system is a simple 
genetic autoregulatory network [ist , which can be found 
as parts of the transcriptional regulatory network in bio- 
logical cells. It again consists of two species, mRNA Xi 
and protein X2, and four reactions. Their associated rate 
laws are given by 






-» Xx 


/ii(x) 


= a[l - 


Xi 


^ 


/^2(X) 


= Pxi , 





^ X2 


/^3(X) 




X2 


^ 


/l4(x) 


= 5X2 , 



(14) 



where <d{x) is the Heaviside step function. Both mRNA 
and proteins decay exponentially, and proteins are pro- 
duced by translation of mRNA with a rate proportional 
to the mRNA number xi. In order to regulate the con- 
centration of the protein, the production of mRNA is 
down-regulated by a factor 0.01 as soon as X2 increases 
beyond a critical threshold Xc- 

Both reaction systems have been simulated using Gille- 
spie's algorithm IJ]. Observations have been obtained 
at regular intervals and corrupted by Gaussian noise with 
standard deviation a. The exact initial conditions, a;i(0) 
and 0:2(0), have not been used for inference purposes. 

As shown in Table U and Table [TTl the inferred reac- 
tion constants are reasonable close to their true values. 
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FIG. 2: Inference results for the genetic autoregulatory net- 
work with parameters given in Table Ull Symbols and lines as 
in Fig. [T] Results of parameter estimation for this data set: 
a = 1.9xl0"\ 13 = 6.8xl0"^ 7 = 3.0xl0~^ 5 = 4.2xl0"^ 
Xc = 19.7. 

Additionally, our method is quite fast, because only one 
backward integration is needed in each step of the maxi- 
mization algorithm. In fact, its computational complex- 
ity is nearly independent of the number of observations. 

In the case of the Lotka-Volterra model, we used a stan- 
dard gradient-based method for optimization. From the 
curvature of the log-likelihood, we also get predictions for 
the uncertainty of the parameter estimates (shown in the 
third column in table |l| which compare fairly well to the 
average estimation errors obtained from the simulations. 

The success of our method for the case of the ge- 
netic autoregulatory network is somewhat more surpris- 
ing. The discontinuity of the reactions as a function of 
the threshold Xc makes this a highly nonlinear model. For 
this case, we have used the Nelder-Mead simplex method 
[isj for parameter optimization which works without 
computing derivatives. 

Figure [T] shows results of state inference for a single 
Lotka-Volterra process which is based on the maximum 
likelihood parameter estimates. Similar results for the ge- 
netic autoregulatory network are shown in Fig.[2j Here, it 
is even possible to predict the strongly nonlinear behav- 
ior of the system for time points after the last observation 
qualitatively. Further analysis of the approximate poste- 
rior distribution pt{'ii\D) indicates that the prediction is 
indeed well calibrated; i.e., m(t) and <7j{t) = \/Cjj(T) 
are good estimators for x(t) and its fluctuations. Conse- 
quently, our method gives reliable information about the 
state of a reaction system, although the internal noise 
is rather large due to the small number of individuals 
(molecules) in the examples given here. 

Up to now, we have only considered complete observa- 
tions, which provide a measurement for each component 
of X. But often it is difficult or even impossible to observe 
all species of a reaction system. In this case, the last par- 
tial observation yat is not sufficient for initializing b and 
S. Therefore, we define an additional virtual observation 



TABLE II: Maximum likelihood estimates of reaction con- 
stants based on 11 observations with cr = 1. Mean values 
and standard deviations have been obtained by averaging over 
1000 samples from the genetic autoregulatory network model. 



parameter inference result true value 
a (1.8 ± 1.2) X 10"^ 2 X 10"^ 
P (5.7 ± 1.9) X 10~-^ 6 X 10"^ 
7 (4.6 ± 1.2) X 10-2 5 X 10-2 
5 (7.4 ±2.3) X 10"^ 7x 10"^ 
Xc 17.6 ± 3.6 20 



y at ijv which contains only noise- free data for the hidden 
species. Then, the backward integration can be started 
with ^(^(x) — p(yjv|x)p(y|x) and the total likelihood is 
given by p{D\9,y) = p{D,y\9)/p{y\9). Here, we choose 
Pt„(x), the marginal distribution of the prior process, as 
prior distribution for our virtual observation y so that it 
and the denominator of the likelihood cancel out exactly. 
Therefore, it is possible to estimate the parameters and 
to reconstruct the dynamics of unobserved species by just 
maximizing p{D,y\6) with respect to y and 6. 

We are planning to combine our method with a Monte 
Carlo approach in order to sample from the exact pos- 
terior. Using our approximation, we expect to generate 
good proposal paths for an efficient Metropolis sampler. 
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